Exploring Social Context for Topic Identification in Short and Noisy Texts

نویسندگان

  • Xin Wang
  • Ying Wang
  • Wanli Zuo
  • Guoyong Cai
چکیده

With the pervasion of social media, topic identification in short texts attracts increasing attention in recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagion are observed in social media, we investigate topic identification in short and noisy texts by exploring social context from the perspective of social sciences. In particular, we present a mathematical optimization formulation that incorporates the preference consistency and social contagion theories into a supervised learning method, and conduct feature selection to tackle short and noisy texts in social media, which result in a Sociological framework for Topic Identification (STI). Experimental results on real-world datasets from Twitter and Citation Network demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of social context in topic identification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations

Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle ...

متن کامل

Microblog sentiment analysis using social and topic context

Analyzing massive user-generated microblogs is very crucial in many fields, attracting many researchers to study. However, it is very challenging to process such noisy and short microblogs. Most prior works only use texts to identify sentiment polarity and assume that microblogs are independent and identically distributed, which ignore microblogs are networked data. Therefore, their performance...

متن کامل

An Analysis of the Function of Non-verbal Communications in Shahnameh

Non-verbal communications could be a part of visual signs that, because of their importance in interpersonal relationships and transition of meaning, are highly regarded by psychologists and sociologists. One of the major subdivisions of this topic is called "Body Language" that has existed in all human societies since ancient times. These non-verbal signs, some thousands of years old, have cul...

متن کامل

Language Identification Based on High Frequency Approaches

This paper deals with the problem of automatic language identification of noisy texts, which represents an important task in natural language processing. Actually, there exist several works in this field, which are based on statistical and machine learning approaches for different categories of texts. Unfortunately, most of the proposed methods work fine on clean texts or long texts, but often ...

متن کامل

Exploring the Relationship Between Modality and Readability Across Different Text Types

With regard to the relationship between the use of modality and readability levels oftexts, 2 opposing views have been raised. The first view endorses direct positiverelationship between modality and readability in the sense that the use of modalityincreases textual understandability. The second view is that the use of modality leadsto an increase in the number of words, resulting in readabilit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015